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Jacobi  type  and  guarantees  that  the  result  will  provide  a  legitimate  probabilit 
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I .  INTRODUCTION 


Finite  mixtures  of  continuous  distributions  are  being  used  in  an 
increasing  number  of  stochastic  modeling  efforts  (see  Harris,  Kaylan  and 
Maltz,  1982;  Everitt  and  Hand,  1981).  The  most  common  distribution  applied 
appears  to  be  the  mixed  exponential,  where  the  probability  density  function 
would  be  written  as  a  convex  linear  combination  of  (say)  K  exponential 
subpopu 1 at ions 


K  *$^t  K  . 

b(t)  =  l  Z.  *  e  (  l  Z  -  1).  U; 

i=l  i=l 

Though  such  models  have  very  general  application,  there  are  some  serious 
restrictions  on  their  use.  Host  importantly,  the  functional  form  of  this  PDF 
requires  that  its  data  exhibit  quite  a  strong  monotone  decreasing  pattern 
since  b(t)  always  lies  somewhere  between  the  K  monotone  subdensities.  (The 
precise  mathematical  term  is  that  it  is  completely  monotone.)  Though  it  is 
true  that  the  mixed  exponential  (with  possibly  an  open  number  of  terms)  has 
broad  potential  coverage,  it  is  clearly  poorly  suited  for  fitting  data  with 
apparent  modes  and  indeed  any  set  of  frequencies  seeming  to  be  nonmonotone 
decreasing. 

But  a  variation  on  this  mixed  exponential  theme  can  be  adapted  to 
arbitrary  data  sets.  This  is  to  permit  the  linear  proportionality  constants 
{1^}  in  the  mixed  exponential  model  to  be  totally  arbitrary  in  sign,  while 
still,  of  course,  requiring  that  the  resultant  PDF  remain  nonnegative  over  its 
domain.  The  establishment  of  a  practical  numerical  procedure  for  doing 
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maximum- likelihood  estimation  for  these  more  general  mixed  exponential  models 
is  the  major  subject  of  this  work. 

We  refer  to  these  nonconvex  forms  as  GHs  for  generalized  hyperexpo- 

nentials.  It  has  frequently  been  noted  in  the  literature  (e.g.,  in  Neuts, 

1981)  that  the  GH  class  is  dense  in  the  set  of  all  CDFs  on  the  nonnegative 

reals.  By  this,  we  mean  that  there  is  always  a  GH  which  is  as  close  as  we  may 

wish  to  an  arbitrary  CDF  measured  with  respect  to  a  metric  suitably  defined  on 

the  distribution  space.  It  is  also  important  to  recognize  that  the  general 

class  of  linear  combinations  of  negative  exponential  functions  is  a  very 

complete  set  of  approximants  in  general  function  space.  Perhaps  the  most 

critical  characterization  of  this  coverage  is  the  fact  that  all  functions  in 

I^CO, «•)  can  be  approximated  arbitrarily  closely  by  a  finite  linear  combination 

•at  + 

of  functions  of  the  form  e  ,  0eR  .  (For  example,  see  Naylor  and  Sell,  1971, 
for  a  discussion  of  this  and  related  problems  on  the  Hilbert  space  of  square 
integrable  functions . ) 

The  GH  class  has  important  relationships  with  a  number  of  well-known 

comprehensive  families  of  exponentially  related  CDFs.  The  simplest  of  these 

are  the  generalized  Erlangs  (GE)  expressed  as  convolutions  of  independent  and 

not-necessarily-identical  exponential  random  variables.  When  the  means  of 

such  exponentials  are  allowed  to  come  in  conjugate  pairs  (so  that  their 

Laplace-Stieltjes  transforms  are  inverse  polynomials).  Smith  (1953)  calls  the 

family  Kn>  where  n  is  the  degree  of  the  defining  polynomial.  Cox  (1955) 

generalized  K  to  the  class  of  distributions  whose  transforms  are  rational 
n 

functions  (clearly  including  the  inverse  polynomials),  which  we  call  (with 
n  the  degree  of  the  denominator's  polynomial).  The  class  includes  all 
regular  Erlangs,  but  not  all  mixed  exponentials  and  mixed  Erlangs,  which  are, 
however,  members  of  Rn>  We  also  mention  the  generalized  phase-type 


J  v  -• 
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distributions  (PH)  popularized  by  Neuts  and  others  (see  Neuts,  1981),  which 

have  rational  transforms  as  well,  though  not  necessarily  of  the  inverse 

polynomial  form.  Thus,  we  may  symbolically  represent  the  relationship  of 

those  respective  families  as  GEcK  cR  and  PHcR  . 

n  n  n 

To  position  the  GH  class  together  with  the  others,  we  first  observe  that 

any  member  of  R  with  real  and  distinct  zeroes  for  its  transform  denominator 
n 

polynomial  is  a  generalized  hyperexponential.  But  not  all  GH  are  phase  types, 
since  there  are  linear  combinations  of  exponentials  which  are  densities  but 
cannot  be  derived  as  the  time  to  absorption  of  any  Markov  chain  (see  Dehon  and 
Latouche,  1982). 

The  numerical  procedure  developed  for  estimating  the  parameters  of  the 
generalized  hyperexponentials  has  been  built  up  from  previous  work  on  exponen¬ 
tial  and  Weibull  mixtures.  Throughout  we  assume  that  the  data  sampling  is 
complete  so  that  all  random  times  are  fully  observed.  In  the  event  that  there 
are  incomplete  data  observations,  the  algorithm  is  easily  altered. 

Maximum- likelihood  estimation  is  the  method  selected  mainly  because, 
under  fairly  general  conditions,  it  enjoys  the  important  limiting  statistical 
properties  of  efficiency,  normality,  and  unbiasedness.  Furthermore,  the  MLEs 
are  consistent,  invariant,  and  are  functions  of  sufficient  statistics  if  they 
exist.  When  sufficiency  and  unbiasedness  both  hold,  the  MLEs  are  also  of 
minimum  variance. 

A  first  key  observation  is  that  it  is  not  possible  to  obtain  explicit 
formulas  for  the  maximum -likelihood  estimators  of  parameters  involved  in  mixed 
exponential  densities  by  taking  the  partial  derivatives  and  equating  them  to 
zero.  Hence  we  resort  to  other  optimization  methods  and  numerical  techniques. 
Furthermore,  we  need  to  take  into  account  a  set  of  constraints  in  addition  to 
the  objective  function.  The  mixing  proportions  and  scale  parameters  must 
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satisfy  some  simple  linear  relationships  and  there  may  exist  other  constraints 
related  to  the  sub -population  parameters.  Note  that  the  constraints  are 
generally  of  a  linear  type;  hence  the  problem  can  be  described  as  a 
mathematical  program  with  a  nonlinear  objective  function  and  linear 
constraints . 

II.  PROBLEM  STATEMENT 

The  target  criterion  function  in  our  maximum -likelihood  optimization 
problem  is  the  usual  joint  density  function  for  a  random  sample  from  the 
population  governed  by  the  b(t)  of  (1).  As  is  common,  it  is  much  easier  in 
this  situation  to  work  with  the  logarithm  of  the  likelihood  function.  Thus, 
if  we  write  the  likelihood  for  the  random  sample  t^, . . . ,tN  as 

N  N  K  -#  t. 

£(a)  =  H  b(t.;a)  =11  t  *.*.e  1  J  (2) 

J=1  J  j=l  i=l 

where  e  is  the  vector  of  parameters  (which  may  include  K) ,  then  its  logarithm 
is  expressed  as 

N 

L(o)  =  l  In  b(t.;o)  .  (3) 

j=l  J 

The  MLE  problem  for  the  (generalized)  mixture  may  then  be  formulated  as  the 

nonlinear  constrained  optimization  problem: 

max  L(a) 
a 

subject  to 

a  e  S  =  (a|Zyi  =  1;^  2  0}  .  (4) 

Under  the  standard  mixed-exponential  regime,  each  >  0,  and 

would  be  real  and  also  greater  than  0.  The  most  efficient  algorithm  available 


for  the  solution  of  this  problem  is  due  to  the  joint  efforts  of  Kaylan  (1978) 
and  Kaylan  and  Harris  (1981),  and  Mandelbaum  (1982)  and  Mandelbaum  and  Harris 
(1982).  It  is  a  sequential  numerical  procedure  whose  principal  sequence  of 
points  is  generated  using  nonlinear  Jacobi- like  iterations,  but  with  an 
embedded  subsequence  generated  by  Armijo  steps.  Details  will  be  provided 
below. 

Because  the  mixing  parameters  (JL)  may  be  negative  in  our  problem,  the 
Kaylan/Mandelbaum/Harris  (K/M/H)  algorithm  has  been  carefully  altered,  though 
its  basic  approach  is  retained.  Three  major  changes  in  the  algorithm  were 
necessary.  First,  additional  code  had  to  be  added  to  make  sure  that  the 
density  function  b(t)  did  not  become  negative.  The  second  and  third 
alterations  were  required  to  ensure  that  the  algorithm  generates  ascent 
directions  for  the  {0^}  and  {i\},  respectively,  when  the  sign  restriction  on 
the  (2Ti>  is  relaxed. 


III.  DESCRIPTION  OF  THE  ALGORITHM 

The  basis  of  the  algorithm  is  the  first-order,  nonpost-mortem  method  for 
the  convex  mixing  of  Weibull  distributions  described  in  Mandelbaum  and  Harris 
(1982).  Exponential  distributions  are,  of  course,  Weibull  with  shape 
parameter  one.  This  numerical  scheme  calculates  the  parameter  values  oV+^ 
at  the  (v+l)st  iteration  as 

aV+1  =  aV  +  sVd(v)  (v  =  0,1,2,...),  (5) 

where 

V  V 

a  is  the  vector  of  parameter  values  at  the  (v)th  iteration,  i.e.,  a 
=  Uj,  *2  » •  •  *  »*K'  *1‘  *2’ ' '  •  ,yK-l*  (yK  obtains  from  the 


sum-to-one  constraint); 


d(v)  is  the  ascent  direction  generated  at  the  (v)th  iteration  (described 

below) ; 
v  -i(v) 

s  —  2  is  the  step  size  at  the  (v)th  iteration;  and  i(v)  is  the 

smallest  nonnegative  integer  such  that  L(oV+^)  -  L(aV)  £  e(v)  £ 
0  (e(v)  will  be  described  below),  aV+*  e  S  and  b(t)  is 

nonnegative. 

The  algorithm  assumes  is  user-supplied  and  terminates  at  the  first 

iteration  v  for  which  i(v)  =  i  (=  20,  currently).  That  is,  if  twenty 

mdx 

bisections  of  the  step-size  do  not  yield  an  aV+*  which  is  feasible,  which 
forms  a  density  b(t),  and  which  evaluates  to  an  increase  in  the  log- likelihood 
function,  the  algorithm  terminates  at  the  presumed  local  maximum  aV. 

The  ascent  directions  d(v)  are  calculated  in  one  of  two  ways,  depending 
on  the  value  of  the  iteration  index  v  and  a  constant  W  (presently  40).  If  v 
is  not  a  multiple  of  40,  the  regular  iteration  is  employed  as  a  variation  of 
the  nonlinear  Jacobi  step  (see  Ortega  and  Rheinboldt,  1970)  formed  by  making 
use  of  VL(a)  =  0  (see  Mandelbaum  and  Harris,  1982,  for  details).  Otherwise, 
if  v  is  a  multiple  of  40,  then  a  gradient -based  Armijo  step  is  used. 

The  necessity  for  the  two  types  of  iterations  arises  in  the  development 

of  the  K/M/H  algorithm  from  the  use  of  step-size  bisection  in  the  mapping  from 

v  v+l  v 

a  to  a  .  Without  bisection,  i.e.,  if  s  =1  for  all  v,  the  original 

algorithm's  mapping  (5)  is  closed  since  all  functions  involved  are  continuous, 

and  hence  oV  belongs  to  a  compact  set.  But  with  bisection,  continuity  might 

occasionally  be  violated  and  convergence  prevented.  To  avoid  this  potential 

pathology,  the  Armijo  step,  which  is  convergent  under  bisection  (see  Armijo, 

1966)  was  included  to  embed  a  convergent  subsequence  of  points  into  the 

principal  sequence. 


The  potential  violation  of  continuity  may  be  further  aggravated  by 


allowing  unrestricted  signs  on  each  in  the  current  algorithm.  Though 
empirically,  the  {JL}  appear  not  to  change  from  their  initial  signs  very 
often,  in  theory,  the  potential  change  in  sign  could  cause  discontinuity  in 
the  functions  comprising  the  principal  mapping  and  hence,  hinder  convergence 
of  the  principal  sequence.  The  gradient-based  Armijo  step  would  not  be 
subject  to  this  discontinuity  and  hence,  serves  to  provide  a  convergent 
subsequence  of  points  for  this  case  as  well. 


The  ascent  direction  calculation  of  the  regular  iteration  is  where  it  has 
been  necessary  to  alter  the  Kaylan/Mandelbaum/Harris  algorithm.  The  original 
algorithm,  at  the  (v)th  iteration,  is  defined  by  solving  the  equations 


3o,(ol’  02’-,*,ai-l’  ai’  °i+l,,,,a2K-l)  “  °  (i  *  1,2,...,2K-1)  (6) 


for  each  a^,  i  =  1,2, . . .  ,2K-1,  in  sequence,  and  setting  d(v)  =  a  -  aV. 
To  allow  the  equations  of  (6)  to  be  resolved  as 

«i  =  h.(«V)  (i  —  1,2, .. . ,2K-1),  (7) 

we  allow  to  be  used  on  the  right-hand  side  if  a  numerical  procedure 
would  have  been  needed  to  isolate  o^.  The  explicit  form  of  the  equations  of 
(7)  and  the  details  of  their  derivation  can  be  found  in  Mandelbaum  (1982)  and 


Mandelbaum  and  Harris  (1982).  It  should  be  noted  here  that  although  the 


regular  iteration  makes  use  of  gradient  information,  it  is  not  a  gradient 
step,  since  the  equations  (6)  are  solved  sequentially  rather  than 
simultaneously.  The  gradient  (steepest  ascent)  direction  would  be  that 


derived  through  solving  dL/da  =  0  where  a  and  0  are  vectors. 

The  modified  regular  iteration  calculates  the  vector  a  as  previously,  but 


forms  the  ascent  direction  d(v)  differently.  Recalling  that 


i  ■ 


“  *  (“l  =  *1*  a2  =  ^2* ' '  ‘  ,aK  =  *K’  °K+1  =  *1’  aK+2  *2  ’  * -  ‘  ,a2K-l  =  JfK-l) 

(8) 

where  K  is  the  number  of  terms  in  b(t)  (3T^  is  always  calculated  as  one  minus 
the  sum  of  the  K-l  {3\}),  the  elements  of  the  ascent  direction  are 
calculated  as  follows.  The  d^(v),  i  =  1,2,..., K,  (i.e.  ,  the  elements  of  the 
ascent  direction  associated  with  the  exponential  subpopulation  shape 


parameters  t^})  are 


d±(v)  = 


a.  -  aV  if  y.  >  0 

li  l 


o:  -  a.  if  y.  <  0 
11  1 


(i  =  1,2,... ,K)  (9) 


To  calculate  the  (K+l)st,  (K+2)nd,  through  the  (2K-l)th  elements  of  d^Cv), 
(i.e.,  the  elements  of  the  ascent  direction  associated  with  the  mixing 
coefficients  (y^  i  =  1,2, . . . ,K-1>) ,  we  first  calculate  a  test  parameter  s 


K  (y  -  y*)2 

s  »  I  — - - - 

i-1  y^ 


If  s  >  0,  d.(v)  =  a.  -  aj  (i  =  K+l.K+2, . . . ,2K-1) 


Else  (s  <  0),  di(v)  =  -  a.  (i  =  K+l.K+2 . 2K-1) 

The  modifications  in  the  ascent  direction  calculation  for  the  regular 
iteration  obtain  from  a  slight  extension  of  the  ascent  direction  proof  for  the 
original  algorithm  (see  Mandelbaum,  1982,  or  Mandelbaum  and  Harris,  1982). 
The  sufficient  condition  for  proof  of  ascent  direction  is 

d(v)  •  VL(aV)  >  0.  (12) 

From  a  simple  extension  of  the  original  proof,  we  find  that 


8 


i  „  N  #.  e  -  *.  t, 

tv  re's2  y  1  1  .1 

‘i  <V  *  b(t.) 

J  J  (13) 

where  N  is  the  number  of  observed  lifetimes.  Examining  the  RHS  of  (13),  since 
the  squared  terms  are  nonnegative  and  the  terms  of  the  summation  are  by 
definition  nonnegative,  the  sign  of  the  RHS  is  determined  by  the  sign  of 
Hence,  (13)  or  its  negative  will  always  have  a  RHS  i  0,  and  thus  rule 
(9)  is  satisfied. 

As  with  rule  (11),  the  ascent  direction  equations  from  which  (11)  is 
derived  handle  the  {y^,  i  =  1,...,K-1)  together.  Since  under  rule  (9),  the 
first  K  elements  of  the  inner  product  (12)  are  nonnegative,  the  sufficient 
condition  for  proof  of  an  ascent  direction  reduces  to 


l  d  (v)  £  0. 

i=K+l  1  3°i 

Once  again,  a  simple  extension  of  the  original  proof  yields 


k-i  v.  k  (y.  -  yv)2 

s'  ■  (£,  1*1  •  *1)  ^  -  N  £  -  ~i~v  — 

i=i  i  1=1  y. 


where  N  is  the  number  of  observations.  Noting  that  in  (10),  s  =  s  /N,  and 
since  s'  or  -s'  will  always  be  nonnegative,  rule  (11)  obtains  directly. 

The  ascent  direction  calculation  of  the  Armijo  iteration  of  the  new 
algorithm  is  identical  to  that  of  the  original  algorithm;  that  is, 


IV.  SOME  COMPUTATIONAL  EXPERIENCE 


Three  examples  were  created  for  testing  the  algorithm  on  the  University 
of  Virginia's  CDC  Cyber  855.  Samples  of  size  250  were  generated  as  random 
samples  according  to  each  of  three  assumed  densities,  namely, 

(1)  g(t)  =  .5(e_t)  +  .25(3e'3t)  +  .25(5e"5t); 

(2)  g(t)  =  4(2e'2t)  -  4(3e‘3t)  +  l(4e'4t); 

(3)  g(t)  =  2(2e”2t)  -  2(4e'4t)  +  l(6e‘6t). 

Plots  of  these  densities  follow  in  Figures  1-3;  the  particular  forms  were 
chosen  strictly  for  illustration.  The  first  one  is  a  simple  example  of  a 
completely  monotone,  ordinary  exponential  mixture,  while  the  second  one  is 
monotone  decreasing  but  with  two  points  of  inflection.  The  third  case,  on  the 
other  hand,  is  a  unimodal  density  with  shape  determined  by  the  relative  size 
of  its  negative  middle  Y-value. 

The  random  variates  for  the  first  density  were  easily  created  in  the 
usual  composition  or  mixture  way.  Given  the  descending  order  of  the  scale 
parameters  (say  =  1/^)  1,  1/3,  1/5,  we  know  that  the  optimization 
routine  should  find  a  unique  solution,  for  the  algorithm  collapses  to  the 
K/M/H  method  in  that  case.  Two  test  subcases  were  run,  starting  from  equal 
sets  of  mixture  probabilities  but  with  very  different  scale  parameters.  The 
algorithm  came  up  with  nearly  the  same  answer  in  almost  equal  time.  Note  that 
although  the  algorithm  required  98  and  97  iterations,  these  are  done  very 
quickly,  with  the  total  run  requiring  less  than  10.4  seconds  cpu  time. 

The  salient  results  of  the  two  runs  for  test  case  number  1  are  displayed 
in  Table  1,  and  the  estimated  density  resulting  from  the  first  starting  point 
is  plotted  in  Figure  4.  Though  the  simulated  data  set  led  to  different 


ACTUAL  DENSITY  FOR  TEST  CASE  NO.  1 
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Figure  3:  ACTUAL  DENSITY  TOR  TEST  CASE  NO.  3 


Figure  4:  ESTIMATED  DENSITY  FOR  TEST  CASE  NO.  1 


estimates  than  the  original  values,  we  know  that  the  respective  values  would 
eventually  get  closer  with  increased  sample  sizes.  But  the  most  important 
point  here  is  that  the  algorithm  worked  well. 


Table  1 

TEST  CASE  NUMBER  1 

Actual  density  g(t)  =^et+|e^t+|e^t 

with  E[T]  =  19/30  *  .63 
and  Var [T]  =  607/900  =  .674. 
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The  random  variates  for  the  second  and  third  cases  could  not  be  created 


by  composition  because  of  the  negative  mixing  parameters.  Instead,  each  of 
these  two  densities  was  rewritten  in  mixed  generalized  Erlang  form,  from  which 
the  variates  were  easily  derived  using  composition.  When  written  in 
convolution  format,  these  specific  equivalences  turned  out  to  be 


-2t  -3t  -At  -3t  -4t 

g(t)  =  |  (2e  *  3e  *  Ae  )  +  5  (3e  *  Ae  )  (Case  2) 

and 

-2t  -At  -6t  -6t 

g(t)  =  |  (2e  *  Ae  *  6e  )  +  3  (6e  )  (Case  3) 


(The  equalities  are  verified  in  the  simplest  way  by  converting  to  Laplace 
transforms. ) 

For  these  two  test  cases,  we  ran  from  three  different  starting  points, 
one  of  which  was  the  actual  answer,  while  another  was  chosen  to  be  "close". 
The  results  are  displayed  in  Tables  2  and  3. 

For  the  second  case,  we  observe  that  the  first  and  third  starting  points 
led  to  answers  which  look  like  they  may  well  be  the  same  (and  close  to  the 
actual).  There  are  differences  in  values,  but  we  theorize  that  the  first 
answer  would  move  toward  the  third  if  the  algorithm's  stopping  rule  were 
tightened.  But  it  would  appear  that  the  second  starting  point  did  lead  to  a 
truly  alternative  local  solution.  For  purposes  of  comparison,  in  Figure  5,  we 
offer  a  plot  of  the  density  functions  which  result  from  the  first  two 
solutions.  We  see  there  that  the  tw<  unimodal  densities  are  relatively  close 
to  each  other. 

For  this  second  test  case,  note  that  cpu  seconds  ranged  from  a  high  of 
13.8  seconds  to  a  low  of  A. 8.  Through  all  three  subcases,  the  relative  use  of 


Table  2 


TEST  CASE  NUMBER  2 

Actual  density  g(t)  =  4(2e"2t)  -  4(3e-3t)  +  l(4e'4t) 

with  E[T]  =  .916 
and  Var[T]  =  .3544 
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Table  3 


TEST  CASE  NUMBER  3 

Actual  density  g(t)  =  2(2e  ^)  -  2(4e  +  l(6e  ^t) 

with  E [T]  =  .6 
and  VarJT]  =  .361 
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the  bisection  rule  was  comparable,  most  frequently  resulting  from  a 
non -improvement  in  the  objective  function. 

Of  course,  it  should  be  understood  that  in  this  work  we  offer  no 
definitive  strategies  for  locating  the  global  solutions  from  amongst  the 
locals.  The  ability  to  arrive  at  the  global  solution  likely  depends  on  the 
quality  of  the  starting  point.  In  these  test  cases,  all  starting  points  were 
selected  in  an  arbitrary  fashion.  Instead,  we  could  use  a  moment  matching 
technique  for  the  initial  point,  or  possibly  even  a  numerical  curve  fitting 
technique  (with  parameters  adjusted  up  or  down  to  make  sure  it  is  a  density). 
Two  such  numerical  approaches  are  documented  in  McDonough  and  Huggins  (1968) 
and  Harman  and  Fairman  (1973). 

The  results  for  our  third  test  cases  appear  somewhat  different  in  the 
sense  that  we  seem  to  have  found  three  local  solutions.  For  comparison  here, 
we  plotted  the  first  two  experimental  densities  in  Figure  6  and  again  see  that 
they  are  close  to  each  other.  The  third  one  is  also  quite  similar,  but  we 
opted  not  to  plot  it,  noting  instead  that  it  is  almost  identical  to  the 
originating  density  as  plotted  in  Figure  3.  This  time,  cpu  seconds  ranged 
from  2.6  to  71.7.  Again,  there  is  a  preponderance  of  bisection  because  of 
non- improvement ,  though  negative  i)  parameters  caused  44  out  of  692  bisections 
in  subcase  i.  Overall,  the  algorithm  again  performed  well. 
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Figure  6:  ALTERNATIVE  DENSITIES  FOR  TEST  CASE  NO.  3 
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V.  CONCLUDING  REMARKS 


To  close,  we  repeat  our  observation  that  the  algorithm  worked  well.  It 
is  clearly  also  well  suited  for  the  MLE  of  parameters  from  ordinary 
exponential  mixtures  since  it  is  guaranteed  then  to  find  the  global  solution. 

Primary  areas  of  possible  future  work  include  an  exploration  of  the 
statistical  properties  of  these  estimators,  a  firmer  strategy  for  selecting 
"good  starting  points",  and  the  derivation  of  possible  procedures  for 
determining  the  optimum  number  of  terms  to  include.  This  last  concern  is 
akin  to  the  step-wise  regression  problem  and  is  often  mentioned  in  the 
literature  as  a  topic  of  special  interest.  Our  current  code  can  handle  any 
number  of  terms,  but  that  number  must  be  specified  beforehand. 
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